Last week I introduced the rnaturalearth package and how it provides nice country, state, and shoreline data for making maps.
Just like before, let’s obtain country data using the ne_countries() function.
A choropleth is a very common style of showing spatial data in which we color individual regions in a map according to some data that we want to display. The country data from rnaturalearth contains lots of attribute columns, including estimates of population (the “pop_est” column). In this exercise, we want to make a choropleth map of the African continent showing the population of each country.
We can use filter on the world data, as we would on any other data frame. There’s a convenient column named “continent” that we can use for the filter.
Creating the map is easy: just add in a call to geom_sf() with the africa data and map fill to pop_est. But there’s one annoying issue that we have to fix. The pop_est column is stored as text rather than as a number. If we don’t convert it to a number, ggplot2 uses a qualitative color scheme, which is the default when fill or color is mapped to some text variable.
Obviously not what we want. When we map fill to pop_est, we can just do the conversion on the fly by wrapping pop_est in the function as.numeric().
Let’s improve the plot similar to how we styled the Katrina plot: blue for the surrounding water, a border around the map area, and a nice sequential color scheme from the rcartocolor package. Let’s also add country labels, which are stored in a column called “name”, using the geom_sf_text() function.
Fill the countries usind the “Mint” color scheme from rcartocolor.
That certainly looks better, but there are still perceptual problems with this map. In particular, Africa is not surrounded by endless ocean, and we have a lot of space on the bottom because of South Africa’s Prince Edward Islands, which are about half way between South Africa and Antarctica.
We can fix these issues by:
coord_sf() (otherwise, by adding all other countries, we’ll see the entire world)We can get the bounding box around our africa object like this:
## xmin ymin xmax ymax
## -25.34155 -46.96289 51.39023 37.34038
Then, we can plug in those numbers for xlim and ylim in coord_sf(), shaving a bit off the ymin value so that the map doesn’t extend so far south. Finally, since this map is built from multiple data sets, we should now provide the data at the geom_ level.
Fill the countries using an appropriate color scheme of your choice.
Note that I have added lots of different theme and guide customizations just to give you an idea of what’s possible.
You might already have geospatial data that you want to map in the form of a shapefile (a file with the .shp extension). For example, the IUCN provides spatial data on the known geographic ranges of many plant and animal species. These are huge downloads, but I have prepared a small subset for you called iucn_primates.shp that contains only non-human primates. We can use the read_sf() function from the sf package to read in the data:
## Simple feature collection with 417 features and 2 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -99.2274 ymin: -34.83983 xmax: 141.9292 ymax: 41.54623
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 417 x 3
## BINOMIAL FAKE_ID geometry
## <chr> <dbl> <MULTIPOLYGON [°]>
## 1 Allenopithecus n… 1 (((22.49295 2.161146, 22.5617 2.039266, 22.70…
## 2 Allocebus tricho… 2 (((49.9361 -14.38416, 49.9741 -14.46636, 50.1…
## 3 Alouatta arctoid… 3 (((-69.4574 6.108936, -69.476 6.099036, -69.5…
## 4 Alouatta belzebul 4 (((-35.0807 -6.206164, -35.0746 -6.220864, -3…
## 5 Alouatta caraya 5 (((-42.2595 -10.18856, -42.4535 -11.11916, -4…
## 6 Alouatta discolor 6 (((-51.7193 -3.243664, -51.7056 -3.307864, -5…
## 7 Alouatta guariba 7 (((-38.3472 -12.43216, -38.1769 -12.68766, -3…
## 8 Alouatta juara 8 (((-72.2923 11.45834, -72.1034 11.39704, -72.…
## 9 Alouatta macconn… 9 (((-57.1405 5.849836, -57.1373 5.835936, -57.…
## 10 Alouatta nigerri… 10 (((-55.1808 -2.579764, -55.1808 -2.723464, -5…
## # … with 407 more rows
This is stored as a tidy data frame, similar to other data that we have worked with, with a column for each attribute. But there is an additional column called “geometry” that contains the relevant spatial information, along with some spatial metadata such as the geographic projection and the bounding box.
Since it functions as a normal data frame, we can perform operations on it, such as filtering for particular species (the column in the data that contains species is called “BINOMIAL”). Let’s get the range for the white-faced capuchin, Cebus capucinus, and also get the bounding box for this species.
To avoid having to manually write in the xlim and ylim values, as we did for the Africa example above, we can refer to the cc_bbox object that we created, referencing the specific values that we want to extract.
And of course, we can do all the other nifty things that we do with normal ggplot2 plots, such as faceting. Let’s filter so that we pull out all gibbon species belonging to the genus Hylobates, and then create a faceted map of their geographic ranges. The str_detect() function is part of the stringr package, which is a part of the tidyverse. It tests for the presence of a sequence of characters, and returns TRUE if the sequence is detected. In this case, the filter statement below tests each value in BINOMIAL, and retain rows where the string “Hylobates” was detected.
Note how in this plot I have italicized the facet titles using strip.text = element_text(face = "italic") in the theme() function.
Use a different fill for each gibbon species, and use the “Dark2” palette from RColorBrewer.
Another spatial task that you might be faced with is plotting location points relative to other spatial features in some small geographic area. For example, you might plot the locations of artifacts collected at an archaeological site in relation to roads, buildings, etc. In this example, we’re going to plot GPS location points of five groups of capuchin monkeys over a 1-year time period along with streams, trails, and roads to gain insights about their movement patterns.
The data include:
| focal_group | season | year | date | waypoint | seqnum | activity | altitude | lon | lat |
|---|---|---|---|---|---|---|---|---|---|
| Los Valles | DC | 2011 | 1/6/2011 15:15:58 | R_110106_LV_FC_001 | 1 | FI | 300.3 | -85.61092 | 10.84384 |
| Los Valles | DC | 2011 | 1/6/2011 16:01:39 | R_110106_LV_FC_002 | 1 | SP | 286.0 | -85.61085 | 10.84282 |
| Los Valles | DC | 2011 | 1/6/2011 16:16:29 | R_110106_LV_FC_004 | 2 | FO | 293.5 | -85.61038 | 10.84256 |
| Los Valles | DC | 2011 | 1/6/2011 16:31:26 | R_110106_LV_FC_005 | 3 | FF | 302.4 | -85.61033 | 10.84112 |
| Los Valles | DC | 2011 | 1/6/2011 16:45:30 | R_110106_LV_FC_006 | 4 | TT | 302.3 | -85.61114 | 10.84031 |
| Los Valles | DC | 2011 | 1/6/2011 17:00:08 | R_110106_LV_FC_007 | 5 | TT | 312.7 | -85.61131 | 10.83832 |
First, to get oriented, let’s create a basic plot that includes all four data sets. Our X-Y coordinates are called “lon” and “lat”, respectively. For now, we’re not going to set limits using coord_sf().
All the points are concentrated in one relatively small region of the study area. We can’t use the bounding box of any of our spatial features (trails, roads, streams), as we have previously done, because they all extend well beyond the points. To zoom in on the points, we could manually set the limits, or we could convert the points into a spatial feature and then extract the bounding box of the points. Let’s do the latter.
In the capuchin ranging data, our x-y values are longitude and latitude coordinates on the WGS84 global reference system. The standard way of referring to this very common coordinate system is by its EPSG identifier, which is 4326. Explaining all this in detail is beyond the scope of this class, but it was summarized in one of your assigned readings.
To make a spatial feature out of the columns in our data, we use the function st_as_sf() (from the sf package), and we must supply it with the x and y columns, as well as the coordinate reference system. After that, we can get the bounding box.
Note that I am still plotting the points using the non-spatial cap_df data with geom_point() rather than the spatial cap_sf data with geom_sf(). I do this because geom_sf() is excruciatingly slow compared to geom_point() when you have a large number of points. If we replace the geom_point() stuff in the code below with geom_sf(data = cap_sf, aes(color = focal_group), size = 0.5), we would get exactly the same plot but you’d be significantly more annoyed right now because of the delay.
Also, since the coordinates aren’t really useful here, given how small the area is, let’s just turn off the graticule and lat/long values along the axes by providing using datum = NA inside of coord_sf().
Facet the map by group and show each group in a different color using the rcartocolor palette “Bold”.
We can also make use of R’s various analysis capabilities to make interesting plots from dense point data, such as 2D kernel density estimates of home ranges for each group.
Fill the density polygons using the colorspace palette “Heat”.
Or a hexagonal heatmap of 2D bin counts.
Fill the hexagons using the colorspace palette “Purple-Orange”.